Large-Scale Content-Based Matching of MIDI and Audio Files

نویسندگان

Colin Raffel

Daniel P. W. Ellis

چکیده

MIDI files, when paired with corresponding audio recordings, can be used as ground truth for many music information retrieval tasks. We present a system which can efficiently match and align MIDI files to entries in a large corpus of audio content based solely on content, i.e., without using any metadata. The core of our approach is a convolutional network-based cross-modality hashing scheme which transforms feature matrices into sequences of vectors in a common Hamming space. Once represented in this way, we can efficiently perform large-scale dynamic time warping searches to match MIDI data to audio recordings. We evaluate our approach on the task of matching a huge corpus of MIDI files to the Million Song Dataset. 1. TRAINING DATA FOR MIR Central to the task of content-based Music Information Retrieval (MIR) is the curation of ground-truth data for tasks of interest (e.g. timestamped chord labels for automatic chord estimation, beat positions for beat tracking, prominent melody time series for melody extraction, etc.). The quantity and quality of this ground-truth is often instrumental in the success of MIR systems which utilize it as training data. Creating appropriate labels for a recording of a given song by hand typically requires person-hours on the order of the duration of the data, and so training data availability is a frequent bottleneck in content-based MIR tasks. MIDI files that are time-aligned to matching audio can provide ground-truth information [8,25] and can be utilized in score-informed source separation systems [9, 10]. A MIDI file can serve as a timed sequence of note annotations (a “piano roll”). It is much easier to estimate information such as beat locations, chord labels, or predominant melody from these representations than from an audio signal. A number of tools have been developed for inferring this kind of information from MIDI files [6, 7, 17, 19]. Halevy et al. [11] argue that some of the biggest successes in machine learning came about because “...a large training set of the input-output behavior that we seek to automate is available to us in the wild.” The motivation behind c Colin Raffel, Daniel P. W. Ellis. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Colin Raffel, Daniel P. W. Ellis. “LargeScale Content-Based Matching of MIDI and Audio Files”, 16th International Society for Music Information Retrieval Conference, 2015. J/Jerseygi.mid

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Getting computers to understand and process audio recordings in terms of their musical content is a difficult challenge. We describe a method in which general, polyphonic audio recordings of music can be aligned to symbolic score information in standard MIDI files. Because of the difficulties of polyphonic transcription, we perform matching directly on acoustic features that we extract from MID...

متن کامل

On the Computer Recognition of Solo Piano Music

We present work towards a computer system for the automatic transcription of piano performances. The system takes audio files containing polyphonic piano music as input, and produces MIDI output, representing the pitch, timing and volume of the musical notes. The aim of this work is not to reduce the performance data to common music notation, but to extract the performance parameters for a quan...

متن کامل

Melody Matching Directly From Audio

In this paper we explore a technique for content-based music retrieval using a continuous pitch contour derived from a recording of the audio query instead of a quantization of the query into discrete notes. Our system determines the pitch for each unit of time in the query and then uses a time-warping algorithm to match this string of pitches against songs in a database of MIDI files. This tec...

متن کامل

Extracting Ground-Truth Information from MIDI Files: A MIDIfesto

MIDI files abound and provide a bounty of information for music informatics. We enumerate the types of information available in MIDI files and describe the steps necessary for utilizing them. We also quantify the reliability of this data by comparing it to human-annotated ground truth. The results suggest that developing better methods to leverage information present in MIDI files will facilita...

متن کامل

Polyphonic Audio Matching and Alignment for Music Retrieval

We describe a method that aligns polyphonic audio recordings of music to symbolic score information in standard MIDI files without the difficult process of polyphonic transcription. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Large-Scale Content-Based Matching of MIDI and Audio Files

نویسندگان

چکیده

منابع مشابه

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

On the Computer Recognition of Solo Piano Music

Melody Matching Directly From Audio

Extracting Ground-Truth Information from MIDI Files: A MIDIfesto

Polyphonic Audio Matching and Alignment for Music Retrieval

عنوان ژورنال:

اشتراک گذاری